<?php
/**
 * <https://y.st./>
 * Copyright © 2018 Alex Yst <mailto:copyright@y.st>
 * 
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 * 
 * You should have received a copy of the GNU General Public License
 * along with this program. If not, see <https://www.gnu.org./licenses/>.
**/

$xhtml = array(
	'<{title}>' => 'Outliers',
	'takedown' => '2017-11-01',
	'<{body}>' => <<<END
<img src="/img/CC_BY-SA_4.0/y.st./weblog/2018/09/21.jpg" alt="A sports field and what looks like some sort of school" class="framed-centred-image" width="649" height="480"/>
<section id="drudgery">
	<h2>Drudgery</h2>
	<p>
		My discussion posts for the day:
	</p>
	<blockquote>
		<p>
			What would I do if the outliers turned out to reflect the real situation?
			If the outliers aren&apos;t a corruption issue, there&apos;s no need to treat them specially.
			It&apos;s just valid data at that point, and should be kept as-is.
			I don&apos;t think I understand your question.
		</p>
	</blockquote>
	<blockquote>
		<p>
			When outliers are caused by actual errors, those outliers are only a symptom of the underlying issue: the presence of errors.
			Therefore, we can&apos;t simply remove the outliers.
			After all, other data points are likely erroneous as well, they just don&apos;t stand out.
			If you recommend not throwing out the data set and starting over, how would you ensure the data set gets cleaned and the erroneous data points removed?
			How would you detect which data points are erroneous, given that these bad data points won&apos;t all be outliers?
		</p>
	</blockquote>
	<blockquote>
		<p>
			I like your example; it shows a different kind of human error than I&apos;d considered myself.
			Grading according to a rubric will usually result in similar grades, but sometimes a grader will fail to use the rubric correctly or will fail to understand the student&apos;s submission as well as other graders did.
			Likewise, sometimes it&apos;s the majority that misses what the student actually did, so the outlier, or even no one, gave the student the correct grade.
			In this case, some or even all of the data points may be erroneous, even without any data entry issues.
		</p>
	</blockquote>
</section>
END
);
